arXiv:1505.06379vl [cs.SY] 23 May 2015 


1 


Communication-Free Distributed Coverage for Networked Systems 

A. Yasin Yazicioglu, Magnus Egerstedt, and Jeff S. Shamma 


Abstract —In this paper, we present a communication-free 
algorithm for distrihuted coverage of an arbitrary network hy 
a group of mobile agents with local sensing capabilities. The 
network is represented as a graph, and the agents are arbitrarily 
deployed on some nodes of the graph. Any node of the graph 
is covered if it is within the sensing range of at least one agent. 
The agents are mobile devices that aim to explore the graph 
and to optimize their locations in a decentralized fashion by 
relying only on their sensory inputs. We formulate this problem 
in a game theoretic setting and propose a communication-free 
learning algorithm for maximizing the coverage. 

I. Introduction 

In many networked systems, a typical task is to provide 
some service such as security or maintenance via some agents 
with limited capabilities (e.g., m, 121, El). One way of achiev¬ 
ing this task is to solve a locational optimization problem (e.g., 
PI , ||5l, 161, El, ISl) and let each agent serve some part of 
the network around its assigned position. In the absence of a 
centralized mechanism, the agents are faced with a distributed 
coverage control problem, where their objective is to optimize 
their locations by following some decentralized controllers. 

Distributed coverage control is widely studied on continuous 
domains (e.g., il-illoi). One possible approach is to employ 
potential helds to drive each agent away from the nearby 
agents and obstacles (e.g., 0, ini). Alternatively, a prevailing 
approach introduced in ifT^ is to model the underlying loca¬ 
tional optimization problem as a continuous p-median problem 
and to employ Lloyd’s algorithm As such, the agents 
are driven onto a local optimum, i.e. a centroidal Voronoi 
conhguration, where each point in the space is assigned to 
the nearest agent, and each agent is located at the center of 
mass of its own region. Later on, this method was extended 
for agents with distance-limited sensing and communications 
(e.g., 01) and limited power (e.g., 03), as well as for 
heterogeneous agents covering non-convex regions (e.g., 03 ). 
Also, the requirement of sensing density functions was relaxed 
by incorporating methods from adaptive control and learning 
(e.g., 01). 

In some studies, distributed coverage control was studied 
on discrete spaces represented as graphs (e.g., 03 , 03 , 
ifn , El). One possible approach is to achieve a centroidal 
Voronoi partition of the graph via pairwise gossip algorithms 
(e.g., 03 ) or via asynchronous greedy updates (e.g., 03 ). 


Alternatively, distributed coverage control on discrete spaces 
can be studied in a game theoretic framework (e.g., 03, 
El). Game theoretic methods have been used to solve many 
cooperative control problems such as vehicle-target assignment 
(e.g., ED), dynamic vehicle routing (e.g. 123), cooperative 
communication (e.g., E3), and coverage optimization (e.g., 
03, Gl). In 03, sensors with variable footprints achieve 
power-aware optimal coverage on a discretized space. In El . 
a group of heterogeneous mobile agents are driven on a graph 
to maximize the number of covered nodes. 

In this paper, we study a distributed coverage control 
problem on graphs in a game theoretic setting. In this problem, 
mobile agents are arbitrarily deployed on an unknown graph. 
Each agent is assumed to sense the local graph structure and 
the presence of other agents (if any) within its sensing range. 
Any node of the graph is covered if it is within the sensing 
range of at least one agent. The objective of the agents is to 
maximize the number of covered nodes by optimizing their 
locations on the graph. We present a game theoretic formula¬ 
tion for this coverage control problem. We particularly focus 
on a communication-free setting, where each agent should be 
driven via only its sensory inputs. In that case, the agents do 
not observe their exact utilities in the corresponding game. 
Accordingly, we propose a learning algorithm for driving the 
agent positions based on some estimated utilities. Using the 
proposed method, the agents maintain optimal coverage with 
an arbitrarily high probability as time goes to inhnity. 

The organization of this paper is as follows; Section III 
presents the distributed graph coverage problem. Section lin 
sets up the game-theoretic formulation of the problem. Section 


IV presents a solution that requires some explicit communi¬ 


cations among the agents. The proposed communication-free 
solution is presented in Section |V] Some simulation results 
for the proposed method are presented in Section VI Finally, 
Section VII concludes the paper. 


IT Distributed Graph Coverage 

In this section, we present the distributed graph coverage 
(DGC) problem, where the goal is to maximize the number 
of covered nodes by driving the agents with limited sensing 
and mobility capabilities to optimal locations on a graph. First, 
some graph theory preliminaries are presented. 
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A. Graph Theory Concepts 

An undirected graph Q = (V, E) consists of a node set V 
and an edge set E C V x V. For an undirected graph, the 
edge set consists of unordered node pairs (v,v') denoting that 
the nodes v and v' are adjacent. 

A path is a sequence of nodes such that each node is 
adjacent to the preceding node in the sequence. For any two 






2 


nodes v and v', the distance between the nodes d{v,v') is the 
number of edges in a shortest path between v and v'. A graph 
is connected if the distance between any pair of nodes is finite. 

The set of nodes containing a node v and all the nodes 
adjacent to v is called the (closed) neighborhood of v, and it 
is denoted as Afy. For any (5 > 0, the 5-neighborhood of v, 
is the set of nodes that are at most 6 away from v, i.e. 

= {v' eV\d(v,v')<S}. (1) 

For any Q = (V,E), an induced subgraph, consists 

of the vertices, Vs Q V, and the edges whose endpoints are 
both in Vs. 


B. Problem Formulation 

Consider a connected undirected graph, Q — {V, E), and let 
7 = {1, 2,, m} denote a set of m mobile agents arbitrarily 
deployed on some nodes of the graph. Let each agent have 
a sensing range, 5. We assume that each agent, i, can sense 
the subgraph induced by the nodes in and the presence of 
other agents (if any) within its 5-neighborhood. As such, each 
is/ located at Vi € V covers all the nodes in . Any node 
of the graph is covered if it is included in the 5-neighborhood 
of at least one agent, and the set of covered nodes, 14 ^ V, 
is given as 

m 

Vc{vi,...,Vm) = (2) 

i=r 

The objective in the distributed graph coverage (DGC) 
problem is to have the agents update their positions over time 
in a distributed manner to maximize the number of covered 
nodes, i.e. 

\Vc{vi(t),... ,v^{t))\, (3) 

where each Vi{t) S L is the position of agent i at time t. 

In order to achieve optimal coverage in a distributed fashion, 
the agents need some local rules to follow. In general, a rule 
is considered to be local if its execution by an agent requires 
only some information available within a small distance from 
the agent. In this paper, we consider a discrete time dynamics 
and we assume that each agent can either maintain its position 
or move to an adjacent node in the next time step, i.e. 

d{vi{t + l),Vi{t)) < 1, Vi e {1,2, ...,to}. (4) 


case, the resulting performance would significantly depend on 
the graph structure and the initial configuration. This method 
may rapidly lead to a reasonable approximate solution if 
the agents start with a sufficiently good initial coverage or 
if the interaction graph satisfies some structural properties. 
However, it may also lead to arbitrarily poor configurations for 
arbitrary graphs and initial conditions. For instance, consider 
the scenario in Fig. [T] where 2 agents with sensing ranges 
5 = 1 can achieve a globally optimal configuration in 2 
time steps. In this example, the initial configuration would be 
stationary under a greedy approach since none of the agents 
can improve the coverage by moving to a neighboring node. 
Note that the performance in Fig. would be arbitrarily poor 
for any arbitrarily large graph obtained by adding more leaf 
nodes attached to the unoccupied hub. 




(a) (b) (c) 


Fig. 1. A possible trajectory to a globally optimal configuration for two agents 
on a small graph. The agents have cover ranges 5 = 1, and they are initially 
located as in (a). The number of covered nodes (shown as non-white) is 
reduced in the intermediate step illustrated in (b) to reach the global optima 
shown in (c). 


In order to ensure efficient coverage for arbitrary graphs and 
initial configurations, a solution method should occasionally 
allow for graph exploration at the expense of a better coverage. 
In this work, we present such a solution by approaching the 
problem from a game theoretic perspective. In particular, we 
map the DGC problem to a game, and we design a learning 
algorithm for the agents to follow in updating their actions. 

III. Game Theoretic Formulation 

In this section, a game-theoretic formulation of the DGC 
problem is presented. First, some game theory preliminaries 
are provided. 


C. Solution Approach 

In the DGC problem, a group of mobile agents explore an 
unknown graph and aim to cover as many nodes as possible. 
As such, the underlying locational optimization problem is 
similar to the maximum coverage problem (e.g., Q, 0). Such 
NP-hard problems are typically tackled by finding sufficiently 
good approximate solutions through fast algorithms (e.g. Il24l . 
ES, ESI). Similarly, in many distributed coverage control 
studies, a locational objective function is optimized by the 
agents aiming for the best local improvements (e.g., m-iH). 
Such a distributed greedy approach can be employed to solve 
the DGC problem. Accordingly, the agents may move locally 
on the graph to maximally improve their local coverage. In that 


A. Game Theory Concepts 

A finite strategic game F = (7, A, U) consists of three com¬ 
ponents; (1) a set of m players (agents) I = {1,2, ...,m}, 
(2) an m-dimensional action space A = Ai x A 2 x ... x A^, 
where each A^ is the action set of player i, and (3) a 
set of utility functions U = {f7i, C/ 2 ,..., L4i}, where each 
Ui : A I—> 3? is a mapping from the action space to real 
numbers. 

For any action profile a € A, let o_i denote the actions of 
players other than i. Using this notation, an action profile a 
can also be represented as o = (ai,a-i). 

A class of games that is widely utilized in cooperative 
control problems is potential games. A game is called a 



3 


potential game if there exists a potential function, f : A ft, 
such that the change of a player’s utility resulting form its 
unilateral deviation from an action profile equals the resulting 
change in f. More precisely, for each player i, for every 
fli, a- S Ai, and for all o_i S A-i, 

Uii^CLj^, O—i) CL—f) — (X—i) (X—f). (5) 

When a cooperative control problem is mapped to a potential 
game, usually the game is designed such that its potential 
function captures the global objective of the control problem. 
Once a such potential game is designed, some game theoretic 
learning algorithms such as log-linear learning 1221 can be 
utilized to drive the agent actions to the set of potential 
maximizers. 


B. DGC Game 

In order to formulate the DGC problem in a game theoretic 
setting, we design a corresponding game, Fdgc, by defining 
the action space and the utility functions. More specifically, 
we design a potential game such that its potential function, 
(/)(a), captures the global objective of the DGC problem, i.e. 

'/'(«) = lK(o) I- (6) 

In the DGC problem, the coverage provided by each agent is 
determined by the position of the agent. Hence, the action of an 
agent can be defined as its position on the graph. Accordingly, 
each action set is equal to the node set of ^ = (V, E), i.e. 

A, = V, Vze I. (7) 

Then, the utilities should be designed such that in 
is indeed the potential function for the resulting game. To this 
end, we design the agent utilities as 


UM) 


K\U<i’ 

Ui{v,a-i), ( 8 ) 


where, for every v G ffa-, Ui{v, a-i) is the partial utility agent 
i gathers by covering node v, and it is defined as 


Ui{v,a-i) 


1 if d{v, Gj) > 5 Vj 7 ^ i, 

0 otherwise. 


(9) 


In the resulting game, each agent gathers a utility equal 
to the number of nodes that are covered only by itself. Note 
that this utility is equal to the marginal contribution of the 
corresponding agent to the number of covered nodes. 


Lemma 3.1. The utilities in lead to a potential game 
Fdgc = with the potential function given in ([^. 

Proof Let Ui = Vi and a' = u' be two possible actions for 
agent i, and let a^i denote the actions of other agents. Due 
to and 

</>(«) = I U<l (10) 

i^I 


Using ([8]l, for any agent i, can be expanded as 

4>{a) = \f/^. \ [J I +1 [J I = Ui{ai,a-i) + \ [J |. 

( 11 ) 

Using (111 for any pair of actions and a', 

G—f fi^Gi, G—i) — G—f) UiijXi, G—if (12) 

□ 


C. Learning 

In game theoretic learning, starting from an arbitrary initial 
configuration, the agents repetitively play a game. At each step 
t G {0,1,2,...}, each agent i G I plays an action Gi(t) and 
receives some utility C/i(a(f)). In this setting, the agents update 
their actions in accordance with some learning algorithms. For 
the DGC problem, the learning process is desired to drive the 
agent positions to the set of configurations that maximize the 
number of covered nodes. 

For potential games, a learning algorithm known as log- 
linear learning (LLL) can be used to drive the agents to 
action profiles that maximize the potential function (/)(a) lIZTl . 
Essentially, LLL is a noisy best-response algorithm, and it 
induces a Markov chain over the action space with a unique 
limiting distribution, p*, where e denotes the noise parameter. 
As the noise parameter, e, goes down to zero, the limiting 
distribution, gL*, has an arbitrarily large part of its mass over 
the set of potential maximizers EH. However, LLL assumes 
that at any round each player i has access to all the actions in 
its action set Ai. In general, LLL may not provide potential 
maximization when the system evolves over constrained action 
sets, i.e. when each agent i is allowed to choose its next action 
from only a subset of actions. Note that this is indeed the case 
for the DGC problem, and each agent has to pick its the next 
action from the closed neighborhood of its current action Gi, 

A^{G,)=Ma^yiGl. (13) 

The issue of constrained action sets was addressed in ll28l . and 
a variant learning algorithm called binary log-linear learning 
(BLLL) was presented for such cases. 

In learning algorithms, typically each agent is assumed to 
measure its current utility. For instance, in order to execute 
LLL or BLLL, the agents need to measure their utilities 
resulting from their current actions as well as the hypothetical 
utilities they may gather by unilaterally switching to some 
other actions. Alternatively, a payoff-based implementation 
may be utilized to avoid the necessity to compute the hypothet¬ 
ical utilities ESll . Note that, for Fogc^ sven the computation 
of the current utility requires some explicit communications 
since the agents with overlapping coverage are not necessarily 
within the sensing range of each other. In general, such agents 
can be up to 26 apart on the graph. 

D. Stochastic Stability Concepts 

For potential games, noisy best-response algorithms such as 
LLL or BLLL induce a regular perturbed Markov chain over 
the action space such that the stochastically stable states are 
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the potential maximizers. The concept of stochastic stability 
will be extensively used in the remainder of this paper. Hence, 
we provide some preliminaries prior to our derivations. 

Definition (Regular Perturbed Markov Chain): Let P be the 
transition matrix of a discrete-time Markov chain over a hnite 
state space X. A perturbed Markov chain with the noise 
parameter e is called a regular perturbed Markov chain if 

1 ) Pj is aperiodic and irreducible for e > 0 , 

2 ) lime _>0 Pe = P, 

3) For any x,x^ G X if P^{x,x~^) > 0, then there exists 
R{x,x^) > 0 such that 

0 < li„ (14) 

£- 40 + 

where R{x,x~^) is called the resistance of the transition 
from X to x^. 

Any regular perturbed Markov chain, Pg, has a unique 
limiting distribution, fi*, since it is aperiodic and irreducible. 

Definition (Stochastically Stable State): Let P^ denote a reg¬ 
ular perturbed Markov chain over a state space, X. Any state, 
X G X, K stochastically stable if 

lim gL*{x) > 0. (15) 

£->■ 0 + 

The stochastically stable states of a regular perturbed 
Markov chain, P^, can be characterized through a resistance 
tree analysis. For any x G X,?l spanning tree rooted at x, Tx, is 
a directed graph, where the nodes correspond to states, directed 
edges correspond to some feasible state transitions, and there 
is a unique directed path on Tx from any state x' 7 ^ a; to x. 
The resistance of such a tree, R{Tx), is dehned as the sum of 
the resistances of its edges, where the resistance of each edge 
is given as in Tx is called a minimum resistance tree if 
R{Px) — P{Tx) for any Tx, i.e. any spanning tree rooted at x 
has at least as much resistance as 7^*. The stochastic potential 
of a state, x, is dehned as the total resistance of its minimum 
resistance tree, R{Tx)- The following result characterizes the 
stochastically stable states through their stochastic potentials. 

Lemma 3.2. 4291/ Let P^ be a regular perturbed Markov 
chain. Any x G X is stochastically stable if and only if x 
is a recurrent state of the unperturbed chain, Pg, with the 
minimum stochastic potential. 

IV. Coverage Maximization 


Algorithm I: Binary Log-linear Learning ( 1281 1 

1 : initialization: e G small, a G A arbitrary 

2 : repeat 

3 : Pick a random i G I, and a random a' G A?(a*). 

4 : Compute a = P = 

5 : With probability set ai = a'i. 

6 : end repeat 

In BLLL, a single agent is randomly chosen at each time 
step. The selection of a single agent at each time step can be 
achieved (with a very high probability) without a centralized 
coordination by using methods such as the asynchronous time 
model proposed in ll30ll . The selected agent, assuming that all 
the other agents are stationary, updates its action depending on 
its cun'ent utility and the hypothetical utility it would receive 
by playing a random action in its constrained action set. This 
is illustrated in Fig. 

/ 7 i(ai, 02) — 2 02) = 3 




Fig. 2. An illustration of the BLLL algorithm, where two agents with 5 = 1 
are located as in (a) and Agent 1 is updating its action. Agent 1 randomly 
picks a candidate action, G AJ(ai), as in (b). Its next action is picked 
from {ai, with probabilities depending on the corresponding utilities. For 
the configuration in (b), the tiled node is not providing any utility to either 
of the agents since it is covered by both of them. 

In ESl . it was shown that BLLL can be used to achieve 
potential maximization if the constrained action sets satisfy 
Properties and provided below. 

Property 1 (Reachability) For any agent i G I and any 
action pair a°,a^ G Ai, there exists a sequence of actions 
{ai,a},... ,ai} such that a[ G for all r G 

{1,2,...,k}. 

Property 2 (Reversability) For any agent i G I and any action 
pair Oi, a- G Ai, 

€ Ai{ai) 4^ Oi G Aj(aj). 


In this section, we will briefly show that if all the agents 
follow BLLL in a repetitive play of Fdgc. then the stochas¬ 
tically stable states are the coverage maximizers. A more 
detailed presentation of this approach can be found in ll 20 l . 
As stated earlier, this solution requires some local communi¬ 
cations among the agents. In the next section, we will present 
a communication-free learning algorithm that can achieve the 
same limiting behavior as this method. 


Theorem 4.1. H28\I Consider any finite potential game and 
constrained action sets satisfying Properties and If all 
players adhere to BLLL, then the stochastically stable states 
are the set of potential maximizers. 


In light of Theorem |4.1| the agents can maximize the 
coverage by following the BLLL algorithm in a repetitive play 
of Fdgc. if the constrained action sets given in ( [T3] l satisfy 
Properties [T] and 1^ Lemma |4.2| shows that the constrained 
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action sets indeed satisfy these properties if the graph to be 
covered is connected. 

Lemma 4.2. The constrained action sets in (0 satisfy 
Properties^and^if the graph Q = {V,E) is connected. 

Proof. If the graph is connected, then there exists a hnite- 
length path ..., between any pair of nodes G 

V, and Property is satisfied. Furthermore, for undirected 
graphs, d[v,v') = d{v',v). Hence, Property|^is also satished. 

□ 


Theorem 4.3. Let Q = {V, E) be connected graph, and let 
all agents follow BLLL in a repetitive play o/Fdgc Vt^ith the 
constrained action sets in 0- Then the stochastically stable 
states are the maximizers of 114 ( 0 ) 1 . 


Proof. If Q = (y, E) is connected, then the constrained action 
sets in 0 satisfy Properties 
Hence, in light of Theorem 1^ 


T] and 1^ due to Lemma 4.2 


if all agents follow BLLL in 
a repetitive play of Fdgc. the stochastically stable states are the 
potential maximizers. Due to those are the configurations 
maximizing the number of covered nodes, | 14 (o)|- n 


V. Communication-free coverage Maximization 

In the DGC problem, the sensory inputs do not reveal which 
of the nodes within the sensing range of an agent is covered 
also by some other agents. However, each agent can sense 
if any other agent is also covering its current position as 
illustrated in Fig. Hence, each agent i observes the partial 
utility, Ui{ai,a-i) in (j^, via its sensory input. 



Eig. 3. Distributed graph coverage by agents with sensing ranges 5 = 1. 
Agents 1 and 2 do not observe that the encircled node is covered by both of 
them. However, each of them knows that its current position is covered only 
by itself since no other agent is within its sensing range. 


Since the exact utilities in Fdgc not measurable without 
explicit communications, the agents need to update their 
actions based on some estimated utilities. Assuming that the 
nearby agents will remain stationary for a sufficient amount of 
time, each agent z can construct an estimated utility by visiting 
each V € and combining the sampled Ui{v,a-i). Note 
that the resulting estimation will not necessarily be equal to 
the actual utility since multiple agents may be moving simul¬ 
taneously as illustrated in Fig. However, if the probability 
of having simultaneously moving agents is sufficiently small, 
then false estimations will be sufficiently rare for the agents 
to achieve the desired limiting behavior by following a noisy 


best-response based on the estimated utilities. The proposed 
communication-free algorithm is based on this approach. 



Eig. 4. Two agents with sensing ranges 5 = 1 are located on a graph as in 
(a). Part of the graph that is not sensed by agent 1 is dashed in the figures. 
Agent 1 can estimate its utility in (a) by sampling the partial utilities from 
the nodes in its sensing range. If agent 2 is stationary in the meantime, then 
the resulting estimation will be true. However, if agent 2 is also moving, then 
the sampled partial utilities may be true as in (b) or false as in (c). 

In the remainder of this section, we present the pro¬ 
posed communication-free coverage maximization algorithm 
(CFCM) and an analysis of the corresponding dynamics. 


A. CFCM Algorithm 

The proposed algorithm has two parameters to be set. The 
hrst parameter, e S is the noise in the agent decisions 
when choosing between the candidate actions based on the 
corresponding estimated utilities. The second parameter, r G 
sets the likelihood of each agent to update its action. As it 
will be shown later in this section, the desired global behavior 
emerges when r is sufficiently large and e is small. 

In CFCM, each agent z is either stationary or experimenting. 
Each stationary agent repeats its current action in the next time 
step with a high probability, 1 — e^, or starts an experiment 
with probability e’'. An experiment involves comparing its 
current action, a), to an alternative randomly picked from its 
constrained action set, af G Af(a^), where Af(a^) is the local 
neighborhood of as given in (13 i. In this aspect, the agent 

1. However, 


behavior is similar to the payoff-based BLLL in | 
since the agents receive only some partial utilities, Ui(ai,a-i), 
an experiment consists of visiting all the nodes in U f /^2 
to see which of those nodes are also covered by some other 
agents. We refer to the corresponding path to be traversed as 
an experiment path between aj and of. 


Definition (Experiment Path): Let 6 be the sensing range 
of the agents. For any aj and aj G Af(a^), a finite path, 
{a^,... ,af}, is an experiment path if it traverses UAf^ 2 - 

For any a.) and af G A^(al), an experiment path can be 
obtained locally by utilizing methods such as depth-first search 
or breadth-first search (e.g., lUTJ). In the CFCM algorithm, an 
experiment path between aj and af is denoted as £(aj,af). 
During an experiment, the agent traverses its experiment path 
to construct the estimated utilities, 17/ and 17/, from the 
sampled partial utilities. For simplicity, a partial utility from 
a node is sampled only at the last visit to that node during 
the experiment. As such, if it is the agent’s last visit of the 
current position, a^, and the agent does not sense any other 
agent within S, then the utility estimations corresponding to the 
candidate actions within S from oz are incremented by 1. Once 
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the experiment path is traversed, the agent randomly chooses 
between the two candidate actions based on the estimated 
utilities, tj} and Uf. At the next time step, the agent becomes 
stationary at its chosen action until it starts a new experiment. 

For the CFCM algorithm, the state of any agent i can be 
defined as 

x, = [S, h Ul Uf ], (16) 

where Si is a sequence of actions, which is either a sin¬ 
gleton (stationary) or an experiment path (experimenting), 
ki G {1,..., I S'i I} is an index variable denoting which action 
in Si is currently taken by the agent, and Ul , Uf are the 
estimations for Ui{aj,a-i) and Ui{af,a-i), respectively. In 
this representation, the current action, ai, and the candidate 
actions, aj and of, are given as 

a,=Si{ki), al = Si{l), a- = 5'i(|5'i|), (17) 

where Si{ki) denotes the kl^ element in Si, and |5'i| denotes 
the length of Si. 

Algorithm II: Communication-free Coverage Maximization (CFCM) 

1 : initialization: e 6 (small), r 6 at 6 Ai arbitrary, 

5, = {a,}, k, = l,Ul =Uf = 0. 

2 : repeat 

3: ai = S.{h), a) = 5,(1), = S,(|S.|). 

4: if(|5,| = l) 

5 : Generate a random (uniform) 7 6 [0,1]. 

6 : if (7 < e"^) 

7 : aj is randomly (uniform) chosen over A?(a)). 

8 : Si = S{al,aj). 

9 : end if 

10 : else 

11: if {ki > k, Vk e {k \ Si(k) = ai}) 

12 : ul = Uj + Ui{ai,a^i), if ai e 

13 : Ui = Ui -f Ui{ai, a—i), if ai E A(^2 ■ 

14 : end if 

15: if(fc, = |5,|) 

1C -ul a -ul 

16 : Q = e •, p = e • . 

{a\} w.p. 

{aj} otherwise. 

18 : ki = 1, Ul = Ul = 0. 

19 : else 

20 : ki = ki + 1. 

21 : end if 

22 : end if 

23 : end repeat 

The CFCM algorithm is memoryless since the state of every 
agent in the next time step is independent of its past trajectory. 
As such, if all agents follow the CFCM algorithm, then a 
Markov chain is induced over the state space, X, where each 
X G X is, the global state obtained by concatenating the states 
of all agents, i.e. 


X =[xx,X2,. ■ . ,Xm]. (18) 

In the remainder of this section, the limiting behavior of the 
resulting Markov chain will be inspected through a stochastic 
stability analysis. 

B. Limiting Behavior 

For any x G X, the agents can be grouped into two 
distinct sets consisting of the stationary agents, Is{x), and the 


experimenting agents, Ie{x), as 

Is{x) = {iGl\ |5,| = 1}, (19) 

Ie{x)=I\Is{x). (20) 

Using these sets, for any feasible transition, x —t x^, the 
agents can be grouped into 4 disjoint sets based on the 
transition of their individual states: 

/ss(a;,a;+) =/s(a;) n/s(a;+), (21) 

/se(a;,a;+) =/s(a;) n/e(a;+), (22) 

Iee{x,x'^) = Ie{x)rile{x^), (23) 

/es(a;,a;+) = 7e(a;) n/s(a:+), (24) 


where Iss{x,x~^) are the agents that remain stationary, 
Ise{x,x~^) are the ones starting to experiment, Ie{x) are the 
experimenting agents that have not completed moving along 
their experiment paths, and Ies{x,x~^) are the agents that 
have completed traversing their experiment paths and choose 
between their candidate actions. 

The agents in Ies{x,x^) can be further partitioned as the 
ones choosing their first candidate action and the ones that 
choose their second candidate action, i.e. 

lLix,x-^) = {iG Iesix,x~^) I af = a]}, (25) 

l!s(x,x~^) = {iG Ies{x,x~^) I af = a^}. (26) 

Note that the agents in Ies{x, x’*') do not necessarily choose 
the action resulting in the higher estimated utility. For each 
i G I, let U*=m?i7,{lJl Then, the amount of estimated 
utility that is denied in the transition x —> x’*' is given as 

r uj-ui fiiGii{x,x+), 

A*(xi,x+) = < U*-Uf if z G/2^(x,x+), (27) 

[ 0 otherwise. 

Next, we show that the CFCM algorithm induces a regular 

perturbed Markov chain, where the resistance of any feasible 
transition depends on the estimated utilities denied by the 
agents becoming stationary and the number of agents starting 
new experiments. 

Lemma 5.1. Let Q = (U, E) be connected graph. If all agents 
employ the CFCM algorithm, then a regular perturbed Markov 
chain is induced over X, and the resistance of any feasible 
transition, x -G x'^, is 

i?(x,x+) = r|/se(x,x+)| + ^ Ai{xi,xf). (28) 

ieIea(x,X+} 
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Proof. Let denote the transition matrix of the Markov 
chain induced by the CFCM algorithm. For e > 0, any all¬ 
stationary state can be reached from any other all-stationary 
state through a sequence of experiments, given Q = (V, E) is 
connected. Furthermore, any state that is not all-stationary lies 
on a feasible path between two all-stationary states. Hence, 
is irreducible. Furthermore, since the stationary agents remain 
stationary with probability 1 — e'’, aperiodicity immediately 
follows from the resulting self-loops at all-stationary states. 

The probability any feasible transition from a; to given 
in Pe, is the joint probability of state transitions of individual 
agents. Note that for any agent, i G Iee{x), the transition from 
Xi to xf does not have any randomness. Hence, the probability 
of transition from x to x~^ is 


Pe{x, x+) = Pr[/e^(x, x+)] PT[l‘^six, x+)] Pr[4«(x, x+)] 

Pr[4e(x,x+)], (29) 


where each term on the right side of ( |29) denote the joint prob¬ 
ability of state transitions for the agents in the corresponding 
subset, and they are given as 

Pr[4^(a;,x+)] = Yl 




Pr[I^x,x+)]= 


e-^l+e-^r 




(30) 




Pr[/ss(x,x+)] = (l-e’'), 

i^Iss ) 


(31) 

(32) 


Pr[/se(x,x+)] = 


i^Ise{XyX'‘’' 




Pr[5+;ai,a2], (33) 


where Pr[S^;al,af] is the probability of having as 
the experiment path for an agent comparing aj and of. 
Pr[S'j^;a^,of] depends on the function £(al,a‘f), and it is 
independent of e. Plugging ([30ll-([3^ into p9| ), one can verify 
that the resistance R{x,x~^) given in (28 1 satishes 


Pe(x,x+) 

0 < lim — 57 —^ < oo. 

e_>.0+ e-R(a:,a:+) 


(34) 

□ 


the resistances as given in ( |28l l. We will show that, for any 
connected graph Q, if r is sufficiently large, then the states in 
with the minimum stochastic potential are the coverage 
maximizers. To provide a sufficient value of r, first we relate 
the structure of the graph to the maximum amount of estimated 
utility that can be denied by an agent in any feasible transition 
under the CFCM algorithm. 

Lemma 5.2. Let all agents follow the CFCM algorithm to 
cover a connected graph, Q — (V, E), and let v{Q) be 

v{Q)= max \Af^ \ \. (37) 

{v,v')eE 

Then, for any feasible transition x —> x'^, 

v{Q) > maxAi(xi,x+). (38) 

Proof Let x —>■ x’*' be a feasible transition. For any i G Is{x), 
Ul = Uf = 0. On the other hand, for any i G Ie(x), the 
sampled partial utilities from the nodes CAf^ 2 , contribute 

equally to both U} and tjl. Hence, 


max{|Af4 \<.|, > \Ul - U^l Vz G I. (39) 

In light of (|^ and 

max{|W;^i XAftA, \Af ^^2 \ Af^i|} > Ai(x 7 ,x+), Vz G /. (40) 


Since (ajjaf) G E for any z G Ie{x), (37i implies 
v{g) > max{|W;^i l-^af G /. 

Finally, (|40|) and (|4T]) together imply 


(41) 

□ 


Next, we show that r > zz((/) is a sufficient condition to 
ensure that the paths between the states in on a minimum 
resistance tree consist of unilateral experimentations. 

Definition (Unilateral Experimentation Path): A feasible se¬ 
quence of states, V = {x^, x^,... x"}, is a unilateral exper¬ 
imentation path if x^,x" G X'^, x^,... ,x"“^ G X^ and for 
all 1 < p < zz — 1 


Since the CFCM algorithm induces a regular perturbed 
Markov chain, the stochastically stable states are the recurrent 
states of the unperturbed chain with the minimum stochastic 
potential, as given in Lemma 3.2 Note that if e = 0, then no 
agent starts an experiment. In that case, the set of recurrent 
states, X^, consists of the all-stationary states. All the other 
states, where at least one agent is experimenting, form the set 
of transient states, X(p, i.e. 


A’0 = {x|4(x)=/}, (35) 

X^ = X\ X^. (36) 

The stochastic potentials of the states in X^ are determined 
by the resistances of the feasible transitions. Note that the 
parameter r in the CFCM algorithm has a direct influence on 


Lemma 5.3. Let T* be a minimum resistance tree, and let 
X —>■ x+ G T*. If X G X^, then |/se(x,x+)| = 1. 

Proof Since x G X^, \Isf.{x,x'^)\ > 0, as otherwise, x"*" = 
X and X —>■ x’*' cannot be contained in a tree. Assume that 
|/se(x,x“'')| > 1. Then, choose an arbitrary z G /se(x,x“*') to 
define an x+ ^ x as 

V = {i 

Note that x —>■ x’*' is a feasible transition, and R{x,x~^) = 
i?(x,x+) — r(|/se(x,x+)| — 1). Replacing x —>■ x"*" with x —>■ 
x"*" would give an alternative tree with a smaller resistance, 
which contradicts with T being a minimum resistance tree. 

□ 








Lemma 5.4. Let T* be a minimum resistance tree, and let 
X —> x"*" € T*. If X & and r > v{Q), then we have 
\Iseix,X+)\ < \Ieix)\. 


Since Fogc is a potential game, from we obtain 


^^ix7 \ Xi) = max{(/)(x’"), fix^)} - fix"-). 


( 52 ) 


Proof. Since x G X^, /se(x,i+) = 0 doesn’t imply i"*' = x. 
Hence, there exists an x~^ x such that x —?► i’*' is feasible 
and Ige(x,x~^) = 0. For any such x+, we have 

i?(x,x+) - i?(x,x+) < -r|/se(x,x+)| + |/es(x,x+)|l^(^). 

(44) 

Note that |/es(x,x“'“)| < |/e(x)|. Hence, given r > the 

right side of (44 1 is negative for any |/se(x,x+)| > |Je(x)|. 
In that case, replacing x —?► x'*' with x —>■ X+ would give 
an alternative tree with a smaller resistance, which contra¬ 
dicts with T being a minimum resistance tree. Consequently, 
|/se(x,X+)| < |/e(x)|. □ 

Lemma 5.5. Let r > ^{Q), and let V = {x^,x^,.. .x"} be 
a sequence of states, where x^, x" G X^ and x^,..., x^~^ G 
Xip. IfVGT for some minimum resistance tree T, then V is 
a unilateral experimentation path. 


5.3 


have 


Proof. Since x^ G X'f^, from Lemma 
|/se(x^,x^)| = 1 leadi ng t o |/e(x^)| = 1. Furthermore, for 
r > I'iG), from Lemma t 


5.4 


we have |/se(x^, x^)| =0. Hence, 
we have |/e (x3)| < 1. Using Lemma recursively along V 
we obtain 


\Ise{xP,xP+'^) \ = 


if p = 1, 
otherwise. 


Hence, "P is a unilateral experimentation path. 


(45) 


□ 


Lemma 5.6. If V = {x^, x^,... x"} be a unilateral experi¬ 
mentation path, then 


n—1 


R{V) — ^ R{x^, x^^^) = r-|-max{0(x"), ^(x^)} — ^(x"). 

p=i 

(46) 


Proof. Since V = {x^, x^,... x"} be a unilateral experimen¬ 
tation path, for xP,x^+^ £ Xp, we have 

|/se(xP,xP+^)| = |/es(xP,xP+^)| = 0. (47) 


Hence, such transitions have zero resistance, resulting in 

P(P) = i?(x\x^)-f i?(x""\x"). (48) 

Note that, since x^ G X^ and P is a unilateral experimen¬ 
tation path, we have R(x^,x^) = r and i?(x"“^,x”) = 
Ai(x""\x^), where i G I is the unique experimenting agent. 
Since all the other agents are stationary, i.e. a_i is constant 
along V, the estimated utilities satisfy 

iUlr~^= u{y,a-^) = U^iala.i), (49) 

iU!r~^= Y, u{v,a.,) = U,iala.,). (50) 

Plugging (j4^ and (j5^ into (j27|i we obtain 

Ai(x”“\ x") = max{C/j(x”), Ui{x^)} - t/j(x”). (51) 


□ 


Lemma 5.7. Let r > v{Q), and let Tf and Tf, be minimum 
resistance trees rooted at some x,x' G X^. Then, 

R{T:) < RiXf) ^ fix) > fix'). (53) 

Proof. For r > viQ), in light of Lemma |5.5| the paths between 
the states in X'^ on a minimum resistance tree consist of 
unilateral experimentations. Let x^j G Xj^, and let Tfa be a 
minimum resistance tree rooted at x^j. Let x)^ G X'^ be a state 
such that i?(T*o ) < Ri'Tf') and the unique path, V G T*o , 
from to consists of n unilateral experimentations, i.e. 

n 

RiV)=YRi'Pk), (54) 

k^l 

where is the unilateral experimentation starting at 
and ending at x^“^. Note that, for each such Vk, there exists 
a feasible unilateral experimentation path P^ in the reversed 
direction, starting at x^“^ and ending at Replacing 

each Pfc with P^, one can construct a tree rooted at Note 
that the resistances of these trees satisfy 

n 

Ri%^J-Ri%\) = ^(P(PD-i?(7^fe)) 

k^l 

n 

= ^wxr'^)-</.(xr'=+')) 

k^l 

= fixR)-fixR). (55) 


Note that by definition P(7 ^*t. ) < P(Pe^). Hence, if 

RiTfo) < i?(rA), then RirS) < Ri%^S for any %.. 
Plugging this into (55 i, we obtain fixpfj > fix'jf) 

□ 


Theorem 5.8. Let Q = iV,E) be connected graph. Let all 
agents follow the CFCM algorithm with r > viG), and let x 
be a stochastically stable state of the resulting Markov chain. 
Then, x G X'^ and 

|K(x)|>|Uc(^')l, Vx'gA-o. (56) 


Proof. Let x be a stochastically stable state. Due to Lemma 
X G A’jj and RiTf) < RiTf) for all x' G X^. In light 


3.2 


of Lemma [5?^ if r > viG), then RjTf) £ Riff) implies 
fix) > fix') for all x' G X'^. As such, (56i is satisfied since 
fix) = |Uc(x)|. □ 


Theorem |5.8| indicates that if all agents follow the CFCM 
algorithm with sufficiently large r, then the stochastically 
stable states are all-stationary states maximizing the number 
of covered nodes. As such, the agents asymptotically maintain 
maximum coverage with an arbitrarily high probability for 
arbitrarily small values of the noise parameter e. 
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VI. Simulation Results 

In this section, some simulation results are presented to 
demonstrate the performance of the proposed method. In 
the simulation, a group of 13 agents are initially placed at 
an arbitrary node of a connected random geometric graph. 
Each agent has a sensing range 5 = 1. The graph consists 
of 50 nodes and 78 edges, and it has i'{G) = 4. Note 
that r > i'{G) is a sufficient condition for the stochastic 
stability of potential maximizers due to the sufficiently high 
resistance of simultaneous experiments as given in Lemma [53] 
However, r > v{G) may not be necessary in many cases since 
simultaneously updating agents do not necessarily influence 
the utility estimations of each other, especially when they are 
sufficiently far from each other. In this simulation, the agents 
follow the CFCM algorithm with e = 0.015 and r — 1.5. 

All the agents are initially stationary at the same position 
on the graph. The number of covered nodes throughout a 
period of 200000 time steps is shown in Fig. [^ whereas the 
configuration of the agents on the graph at some instants are 
provided in Fig. [^ As depicted in Fig. [^ after a sufficient 
amount of time, the agents maintain complete coverage with a 
very high probability. For 150000 <t< 200000, the average 
number of covered nodes at each time step is computed as 
49.7. 



Fig. 5. The number of covered nodes as a function of time (CFCM). 



Fig. 6. The configuration of 13 agents on the graph at some instants of 
the simulation (CFCM). The nodes occupied by at least one agent are black, 
the nodes covered by at least one agent ai‘e gray, and the nodes that are not 
covered are white. 


In order to compare the performance with a setting that 
allows for communications, we also present a simulation of 
the same scenario with BELL. The agents start at the same 
initial condition as the previous simulation, and BELL is 
executed with the same noise parameter e = 0.015. The 
number of covered nodes throughout a period of 10000 time 
steps is shown in Fig. |7] whereas the configuration of the 
agents on the graph at some instants are provided in Fig. [^ 
As illustrated in Fig. [^ after a sufficient amount of time, 
the agents maintain complete coverage with a very high 
probability. For 7500 < t < 10000, the average number of 
covered nodes at each time step is computed as 49.76. 



Fig. 7. The number of covered nodes as a function of time (BLLL). 



Fig. 8. The configuration of 13 agents on the graph at some instants of 
the simulation (BLLL). The nodes occupied by at least one agent are black, 
the nodes covered by at least one agent are gray, and the nodes that are not 
covered are white. 


Through the comparison of Figs. [^ and [^ to Figs. [^ 
and it is seen that both algorithms drive the agents to 
some global optima in a similar fashion. However, when the 
agents are allowed to communicate, they can maximize the 
coverage much faster, as one might expect. Despite the slower 
convergence to the limiting distribution, the main advantage 
of the CFCM algorithm is that the agents do not need to 
know their actual utilities whose computation requires some 
communications in the DGC problem. As such, CFCM can 
be employed to optimally distribute some mobile security 
resources on networks, even in scenarios that do not allow 
for such explicit communications. 
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VII. Conclusion 

In this paper, a game theoretic approach was proposed for 
distributed coverage of networked systems by mobile agents 
with local capabilities. We considered a distributed graph 
coverage (DGC) problem, where the network is modeled as an 
undirected graph, and the agents are located on some nodes 
of the graph. Each agent can sense the graph structure and the 
presence of the other agents within its i5-neighborhood, where 
6 is the sensing range. Any node of the graph is covered if it is 
within the sensing range of at least one agent. The agents move 
locally on the graph, and they aim to maximize the number of 
covered nodes. We studied this problem particularly for agents 
with no explicit communications among themselves. 

A game theoretic formulation of the DGC problem was 
obtained by designing a potential game, Edgc- In Togc, the 
action of each agent is defined as its position on the graph, 
and the utility of each agent is equal to the number of 
nodes covered only by itself. It was shown that Edgc can be 
paired with a learning algorithm such as BELL to maximize 
the coverage. However, such learning algorithms require the 
agents to measure their current utilities. In Edgc, the actual 
utilities can not be computed without explicit communications 
since the agents with overlapping coverage are not necessarily 
within the sensing range of each other. In order to address this 
issue, we presented a communication-free learning algorithm, 
namely the CECM. In CECM, the agents follow a noisy best- 
response policy based on the estimated utilities gathered by 
moving around their current positions. The algorithm has a 
noise parameter, e G and a second parameter, r G 3?“*", 
that sets the likelihood of remaining stationary. We showed 
that the CECM algorithm induces a regular perturbed Markov 
chain and the stochastically stable states are the coverage 
maximizers for sufficiently large values of r. A sufficient 
value of r was derived from the topology of the graph. Some 
simulation results were also presented to demonstrate that the 
CECM algorithm achieves optimal coverage. 
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